On calculating the probability of a set of orthologous sequences
نویسندگان
چکیده
Probabilistic DNA sequence models have been intensively applied to genome research. Within the evolutionary biology framework, this article investigates the feasibility for rigorously estimating the probability of a set of orthologous DNA sequences which evolve from a common progenitor. We propose Monte Carlo integration algorithms to sample the unknown ancestral and/or root sequences a posteriori conditional on a reference sequence and apply pairwise Needleman-Wunsch alignment between the sampled and nonreference species sequences to estimate the probability. We test our algorithms on both simulated and real sequences and compare calculated probabilities from Monte Carlo integration to those induced by single multiple alignment.
منابع مشابه
Computation of the Sadhana (Sd) Index of Linear Phenylenes and Corresponding Hexagonal Sequences
The Sadhana index (Sd) is a newly introduced cyclic index. Efficient formulae for calculating the Sd (Sadhana) index of linear phenylenes are given and a simple relation is established between the Sd index of phenylenes and of the corresponding hexagonal sequences.
متن کاملAn Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set
Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...
متن کاملA Further Note on Runs in Independent Sequences
Given a sequence of letters generated independently from a finite alphabet, we consider the case when more than one, but not all, letters are generated with the highest probability. The length of the longest run of any of these letters is shown to be one greater than the length of the longest run in a particular state of an associated Markov chain. Using results of Foulser and Karlin (19...
متن کاملAcquired Antimicrobial Resistance Genes of Escherichia coli Obtained from Nigeria: In silico Genome Analysis
Background: Antimicrobial resistance is a global problem with enormous public health and economic impact. This study was carried out to get an overview of acquired antimicrobial resistance gene sequences in the genomes of Escherichia coli isolated from different food sources and the environment in Nigeria. Methods: To determine the acquired antimicrobial-resistant genes prevalence, genome asse...
متن کاملMining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM
Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...
متن کامل